{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "djUvWu41mtXa"
      },
      "source": [
        "##### Copyright 2019 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "su2RaORHpReL"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NztQK2uFpXT-"
      },
      "source": [
        "# TensorBoard Scalars: Logging training metrics in Keras\n",
        "\n",
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://www.tensorflow.org/tensorboard/scalars_and_keras\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/tensorboard/blob/master/docs/scalars_and_keras.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/tensorboard/blob/master/docs/scalars_and_keras.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eDXRFe_qp5C3"
      },
      "source": [
        "## Overview\n",
        "\n",
        "Machine learning invariably involves understanding key metrics such as loss and how they change as training progresses. These metrics can help you understand if you're [overfitting](https://en.wikipedia.org/wiki/Overfitting), for example, or if you're unnecessarily training for too long. You may want to compare these metrics across different training runs to help debug and improve your model.\n",
        "\n",
        "TensorBoard's **Time Series Dashboard** allows you to visualize these metrics using a simple API with very little effort. This tutorial presents very basic examples to help you learn how to use these APIs with TensorBoard when developing your Keras model. You will learn how to use the Keras TensorBoard callback and TensorFlow Summary APIs to visualize default and custom scalars."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dG-nnZK9qW9z"
      },
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3U5gdCw_nSG3"
      },
      "outputs": [],
      "source": [
        "# Load the TensorBoard notebook extension.\n",
        "%load_ext tensorboard"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "1qIKtOBrqc9Y"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "TensorFlow version:  2.8.2\n"
          ]
        }
      ],
      "source": [
        "from datetime import datetime\n",
        "from packaging import version\n",
        "\n",
        "import tensorflow as tf\n",
        "from tensorflow import keras\n",
        "from keras import backend as K\n",
        "\n",
        "import numpy as np\n",
        "\n",
        "print(\"TensorFlow version: \", tf.__version__)\n",
        "assert version.parse(tf.__version__).release[0] >= 2, \\\n",
        "    \"This notebook requires TensorFlow 2.0 or above.\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "UbFM4dlnGB3S"
      },
      "outputs": [],
      "source": [
        "# Clear any logs from previous runs\n",
        "!rm -rf ./logs/ "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6YDAoNCN3ZNS"
      },
      "source": [
        "## Set up data for a simple regression\n",
        "\n",
        "You're now going to use [Keras](https://www.tensorflow.org/guide/keras) to calculate a regression, i.e., find the best line of fit for a paired data set. (While using neural networks and gradient descent is [overkill for this kind of problem](https://stats.stackexchange.com/questions/160179/do-we-need-gradient-descent-to-find-the-coefficients-of-a-linear-regression-mode), it does make for a very easy to understand example.)\n",
        "\n",
        "You're going to use TensorBoard to observe how training and test **loss** change across epochs. Hopefully, you'll see training and test loss decrease over time and then remain steady.\n",
        "\n",
        "First, generate 1000 data points roughly along the line *y = 0.5x + 2*. Split these data points into training and test sets. Your hope is that the neural net learns this relationship."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "j-ryO6OxnQH_"
      },
      "outputs": [],
      "source": [
        "data_size = 1000\n",
        "# 80% of the data is for training.\n",
        "train_pct = 0.8\n",
        "\n",
        "train_size = int(data_size * train_pct)\n",
        "\n",
        "# Create some input data between -1 and 1 and randomize it.\n",
        "x = np.linspace(-1, 1, data_size)\n",
        "np.random.shuffle(x)\n",
        "\n",
        "# Generate the output data.\n",
        "# y = 0.5x + 2 + noise\n",
        "y = 0.5 * x + 2 + np.random.normal(0, 0.05, (data_size, ))\n",
        "\n",
        "# Split into test and train pairs.\n",
        "x_train, y_train = x[:train_size], y[:train_size]\n",
        "x_test, y_test = x[train_size:], y[train_size:]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Je59_8Ts3rq0"
      },
      "source": [
        "## Training the model and logging loss\n",
        "\n",
        "You're now ready to define, train and evaluate your model. \n",
        "\n",
        "To log the *loss* scalar as you train, you'll do the following:\n",
        "\n",
        "1.   Create the Keras [TensorBoard callback](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/TensorBoard)\n",
        "2.   Specify a log directory\n",
        "3.   Pass the TensorBoard callback to Keras' [Model.fit()](https://www.tensorflow.org/api_docs/python/tf/keras/models/Model#fit).\n",
        "\n",
        "TensorBoard reads log data from the log directory hierarchy. In this notebook, the root log directory is ```logs/scalars```, suffixed by a timestamped subdirectory. The timestamped subdirectory enables you to easily identify and select training runs as you use TensorBoard and iterate on your model.\n",
        " "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VmEQwCon3i7m"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Training ... With default parameters, this takes less than 10 seconds.\n",
            "Average test loss:  0.042797307365108284\n"
          ]
        }
      ],
      "source": [
        "logdir = \"logs/scalars/\" + datetime.now().strftime(\"%Y%m%d-%H%M%S\")\n",
        "tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)\n",
        "\n",
        "model = keras.models.Sequential([\n",
        "    keras.layers.Dense(16, input_dim=1),\n",
        "    keras.layers.Dense(1),\n",
        "])\n",
        "\n",
        "model.compile(\n",
        "    loss='mse', # keras.losses.mean_squared_error\n",
        "    optimizer=keras.optimizers.SGD(learning_rate=0.2),\n",
        ")\n",
        "\n",
        "print(\"Training ... With default parameters, this takes less than 10 seconds.\")\n",
        "training_history = model.fit(\n",
        "    x_train, # input\n",
        "    y_train, # output\n",
        "    batch_size=train_size,\n",
        "    verbose=0, # Suppress chatty output; use Tensorboard instead\n",
        "    epochs=100,\n",
        "    validation_data=(x_test, y_test),\n",
        "    callbacks=[tensorboard_callback],\n",
        ")\n",
        "\n",
        "print(\"Average test loss: \", np.average(training_history.history['loss']))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "042k7GMERVkx"
      },
      "source": [
        "## Examining loss using TensorBoard\n",
        "\n",
        "Now, start TensorBoard, specifying the root log directory you used above.\n",
        "\n",
        "Wait a few seconds for TensorBoard's UI to spin up. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "6pck56gKReON"
      },
      "outputs": [],
      "source": [
        "%tensorboard --logdir logs/scalars"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QmQHlG10Kpu2"
      },
      "source": [
        "<!-- <img class=\"tfo-display-only-on-site\" src=\"https://github.com/tensorflow/tensorboard/blob/master/docs/images/scalars_loss.png?raw=1\"/> -->"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ciSIRibhRi6N"
      },
      "source": [
        "You may see TensorBoard display the message \"No dashboards are active for the current data set\". That's because initial logging data hasn't been saved yet. As training progresses, the Keras model will start logging data. TensorBoard will periodically refresh and show you your scalar metrics. If you're impatient, you can tap the Refresh arrow at the top right.\n",
        "\n",
        "As you watch the training progress, note how both training and validation loss rapidly decrease, and then remain stable. In fact, you could have stopped training after 25 epochs, because the training didn't improve much after that point.\n",
        "\n",
        "Hover over the graph to see specific data points. You can also try zooming in with your mouse, or selecting part of them to view more detail.\n",
        "\n",
        "Notice the \"Runs\" selector on the left. A \"run\" represents a set of logs from a round of training, in this case the result of Model.fit(). Developers typically have many, many runs, as they experiment and develop their model over time. \n",
        "\n",
        "Use the Runs selector to choose specific runs, or choose from only training or validation. Comparing runs will help you evaluate which version of your code is solving your problem better.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "finK0GfYyefe"
      },
      "source": [
        "Ok, TensorBoard's loss graph demonstrates that the loss consistently decreased for both training and validation and then stabilized. That means that the model's metrics are likely very good! Now see how the model actually behaves in real life. \n",
        "\n",
        "Given the input data (60, 25, 2), the line *y = 0.5x + 2* should yield (32, 14.5, 3). Does the model agree?"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EuiLgxQstt32"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[[32.148884 ]\n",
            " [14.562463 ]\n",
            " [ 3.0056725]]\n"
          ]
        }
      ],
      "source": [
        "print(model.predict([60, 25, 2]))\n",
        "# True values to compare predictions against: \n",
        "# [[32.0]\n",
        "#  [14.5]\n",
        "#  [ 3.0]]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bom4MdeewRKS"
      },
      "source": [
        "Not bad!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vvwGmJK9XWmh"
      },
      "source": [
        "## Logging custom scalars\n",
        "\n",
        "What if you want to log custom values, such as a [dynamic learning rate](https://www.jeremyjordan.me/nn-learning-rate/)? To do that, you need to use the TensorFlow Summary API.\n",
        "\n",
        "Retrain the regression model and log a custom learning rate. Here's how:\n",
        "\n",
        "1.  Create a file writer, using ```tf.summary.create_file_writer()```.\n",
        "2.  Define a custom learning rate function. This will be passed to the Keras [LearningRateScheduler](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LearningRateScheduler) callback.\n",
        "3.  Inside the learning rate function, use ```tf.summary.scalar()``` to log the custom learning rate.\n",
        "4.  Pass the LearningRateScheduler callback to Model.fit().\n",
        "\n",
        "In general, to log a custom scalar, you need to use ```tf.summary.scalar()``` with a file writer. The file writer is responsible for writing data for this run to the specified directory and is implicitly used when you use the ```tf.summary.scalar()```."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XB95ltRiXVXk"
      },
      "outputs": [],
      "source": [
        "logdir = \"logs/scalars/\" + datetime.now().strftime(\"%Y%m%d-%H%M%S\")\n",
        "file_writer = tf.summary.create_file_writer(logdir + \"/metrics\")\n",
        "file_writer.set_as_default()\n",
        "\n",
        "def lr_schedule(epoch):\n",
        "  \"\"\"\n",
        "  Returns a custom learning rate that decreases as epochs progress.\n",
        "  \"\"\"\n",
        "  learning_rate = 0.2\n",
        "  if epoch > 10:\n",
        "    learning_rate = 0.02\n",
        "  if epoch > 20:\n",
        "    learning_rate = 0.01\n",
        "  if epoch > 50:\n",
        "    learning_rate = 0.005\n",
        "\n",
        "  tf.summary.scalar('learning rate', data=learning_rate, step=epoch)\n",
        "  return learning_rate\n",
        "\n",
        "lr_callback = keras.callbacks.LearningRateScheduler(lr_schedule)\n",
        "tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)\n",
        "\n",
        "model = keras.models.Sequential([\n",
        "    keras.layers.Dense(16, input_dim=1),\n",
        "    keras.layers.Dense(1),\n",
        "])\n",
        "\n",
        "model.compile(\n",
        "    loss='mse', # keras.losses.mean_squared_error\n",
        "    optimizer=keras.optimizers.SGD(),\n",
        ")\n",
        "\n",
        "training_history = model.fit(\n",
        "    x_train, # input\n",
        "    y_train, # output\n",
        "    batch_size=train_size,\n",
        "    verbose=0, # Suppress chatty output; use Tensorboard instead\n",
        "    epochs=100,\n",
        "    validation_data=(x_test, y_test),\n",
        "    callbacks=[tensorboard_callback, lr_callback],\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pck8OQEjayDM"
      },
      "source": [
        "Let's look at TensorBoard again."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0sjM2wXGa0mF"
      },
      "outputs": [],
      "source": [
        "%tensorboard --logdir logs/scalars"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GkIahGZKK9I7"
      },
      "source": [
        "<!-- <img class=\"tfo-display-only-on-site\" src=\"https://github.com/tensorflow/tensorboard/blob/master/docs/images/scalars_custom_lr.png?raw=1\"/> -->"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RRlUDnhlkN_q"
      },
      "source": [
        "Using the \"Runs\" selector on the left, notice that you have a ```<timestamp>/metrics``` run. Selecting this run displays a \"learning rate\" graph that allows you to verify the progression of the learning rate during this run. \n",
        "\n",
        "You can also compare this run's training and validation loss curves against your earlier runs.\n",
        "You might also notice that the learning rate schedule returned discrete values, depending on epoch, but the learning rate plot may appear smooth.  TensorBoard has a smoothing parameter that you may need to turn down to zero to see the unsmoothed values. \n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "l0TTI16Nl0nk"
      },
      "source": [
        "How does this model do?"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "97T4vT3QkQJH"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[[31.958094 ]\n",
            " [14.482997 ]\n",
            " [ 2.9993598]]\n"
          ]
        }
      ],
      "source": [
        "print(model.predict([60, 25, 2]))\n",
        "# True values to compare predictions against: \n",
        "# [[32.0]\n",
        "#  [14.5]\n",
        "#  [ 3.0]]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oxAVfW8lhZ0e"
      },
      "source": [
        "## Batch-level logging\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "p-ltiSuypLVE"
      },
      "source": [
        "First let's load the MNIST dataset, normalize the data and write a function that creates a simple Keras model for classifying the images into 10 classes."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "N2kbowJTpJWJ"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz\n",
            "11493376/11490434 [==============================] - 0s 0us/step\n",
            "11501568/11490434 [==============================] - 0s 0us/step\n"
          ]
        }
      ],
      "source": [
        "mnist = tf.keras.datasets.mnist\n",
        "\n",
        "(x_train, y_train),(x_test, y_test) = mnist.load_data()\n",
        "x_train, x_test = x_train / 255.0, x_test / 255.0\n",
        "\n",
        "def create_model():\n",
        "  return tf.keras.models.Sequential([\n",
        "    tf.keras.layers.Flatten(input_shape=(28, 28)),\n",
        "    tf.keras.layers.Dense(512, activation='relu'),\n",
        "    tf.keras.layers.Dropout(0.2),\n",
        "    tf.keras.layers.Dense(10, activation='softmax')\n",
        "  ])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "68pV5ZRe1iZ7"
      },
      "source": [
        "### Instantaneous batch-level logging"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "K_QYrBgkTsLH"
      },
      "source": [
        "Logging metrics at the batch level instantaneously can show us the level of fluctuation between batches while training in each epoch, which can be useful for debugging."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6hXsbsXDqgvA"
      },
      "source": [
        "Setting up a summary writer to a different log directory:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "7OTD7Vpg2DLv"
      },
      "outputs": [],
      "source": [
        "log_dir = 'logs/batch_level/' + datetime.now().strftime(\"%Y%m%d-%H%M%S\") + '/train'\n",
        "train_writer = tf.summary.create_file_writer(log_dir)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YlVZY2VqrK9D"
      },
      "source": [
        "To enable batch-level logging, custom `tf.summary` metrics should be defined by overriding `train_step()` in the Model's class definition and enclosed in a summary writer context. This can simply be made combined into subclassed Model definitions or can extend to edit our previous Functional API Model, as shown below:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "IGcNr1ZS1xXL"
      },
      "outputs": [],
      "source": [
        "class MyModel(tf.keras.Model):\n",
        "  def __init__(self, model):\n",
        "    super().__init__()\n",
        "    self.model = model\n",
        "  \n",
        "  def train_step(self, data):\n",
        "    x, y = data\n",
        "    with tf.GradientTape() as tape:\n",
        "      y_pred = self.model(x, training=True)\n",
        "      loss = self.compiled_loss(y, y_pred)\n",
        "      mse = tf.keras.losses.mean_squared_error(y, K.max(y_pred, axis=-1))\n",
        "    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)\n",
        "    with train_writer.as_default(step=self._train_counter):\n",
        "      tf.summary.scalar('batch_loss', loss)\n",
        "      tf.summary.scalar('batch_mse', mse)\n",
        "    return self.compute_metrics(x, y, y_pred, None)\n",
        "  \n",
        "  def call(self, x):\n",
        "    x = self.model(x)\n",
        "    return x\n",
        "\n",
        "# Adds custom batch-level metrics to our previous Functional API model\n",
        "model = MyModel(create_model())\n",
        "model.compile(optimizer='adam',\n",
        "              loss='sparse_categorical_crossentropy',\n",
        "              metrics=['accuracy'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8wqqNaXP2bLc"
      },
      "source": [
        "Define our TensorBoard callback to log both epoch-level and batch-level metrics to our log directory and call `model.fit()` with our selected `batch_size`: "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "YXK-iE0e2UOE"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Epoch 1/5\n",
            "120/120 [==============================] - 5s 36ms/step - loss: 0.4379 - accuracy: 0.8788 - val_loss: 0.2041 - val_accuracy: 0.9430\n",
            "Epoch 2/5\n",
            "120/120 [==============================] - 4s 31ms/step - loss: 0.1875 - accuracy: 0.9471 - val_loss: 0.1462 - val_accuracy: 0.9591\n",
            "Epoch 3/5\n",
            "120/120 [==============================] - 3s 27ms/step - loss: 0.1355 - accuracy: 0.9613 - val_loss: 0.1170 - val_accuracy: 0.9670\n",
            "Epoch 4/5\n",
            "120/120 [==============================] - 3s 27ms/step - loss: 0.1058 - accuracy: 0.9694 - val_loss: 0.0954 - val_accuracy: 0.9723\n",
            "Epoch 5/5\n",
            "120/120 [==============================] - 3s 27ms/step - loss: 0.0872 - accuracy: 0.9752 - val_loss: 0.0843 - val_accuracy: 0.9749\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "<keras.callbacks.History at 0x7fce165a2fd0>"
            ]
          },
          "execution_count": 15,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir)\n",
        "\n",
        "model.fit(x=x_train, \n",
        "          y=y_train,\n",
        "          epochs=5,\n",
        "          batch_size=500, \n",
        "          validation_data=(x_test, y_test), \n",
        "          callbacks=[tensorboard_callback])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SsxwF83h2kiM"
      },
      "source": [
        "Open TensorBoard with the new log directory and see both the epoch-level and batch-level metrics:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XlcafPNY2oUW"
      },
      "outputs": [],
      "source": [
        "%tensorboard --logdir logs/batch_level"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ReO1LD-g2vgk"
      },
      "source": [
        "### Cumulative batch-level logging"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "T0eE_yjt45TN"
      },
      "source": [
        "Batch-level logging can also be implemented cumulatively, averaging each batch's metrics with those of previous batches and resulting in a smoother training curve when logging batch-level metrics."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "He88QGuu25Vm"
      },
      "source": [
        "Setting up a summary writer to a different log directory:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "hX3nsdqi28W1"
      },
      "outputs": [],
      "source": [
        "log_dir = 'logs/batch_avg/' + datetime.now().strftime(\"%Y%m%d-%H%M%S\") + '/train'\n",
        "train_writer = tf.summary.create_file_writer(log_dir)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WHcpdSR6q5MY"
      },
      "source": [
        "Create stateful metrics that can be logged per batch:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0cAiVu_KjOVi"
      },
      "outputs": [],
      "source": [
        "batch_loss = tf.keras.metrics.Mean('batch_loss', dtype=tf.float32)\n",
        "batch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy('batch_accuracy')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f8ESFevo3Vjz"
      },
      "source": [
        "As before, add custom `tf.summary` metrics in the overridden `train_step` method. To make the batch-level logging cumulative, use the stateful metrics we defined to calculate the cumulative result given each training step's data."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "vQ_-46fpjUVl"
      },
      "outputs": [],
      "source": [
        "class MyModel(tf.keras.Model):\n",
        "  def __init__(self, model):\n",
        "    super().__init__()\n",
        "    self.model = model\n",
        "  \n",
        "  def train_step(self, data):\n",
        "    x, y = data\n",
        "    with tf.GradientTape() as tape:\n",
        "      y_pred = self.model(x, training=True)\n",
        "      loss = self.compiled_loss(y, y_pred)\n",
        "    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)\n",
        "    batch_loss(loss)\n",
        "    batch_accuracy(y, y_pred)\n",
        "    with train_writer.as_default(step=self._train_counter):\n",
        "      tf.summary.scalar('batch_loss', batch_loss.result())\n",
        "      tf.summary.scalar('batch_accuracy', batch_accuracy.result())\n",
        "    return self.compute_metrics(x, y, y_pred, None)\n",
        "  \n",
        "  def call(self, x):\n",
        "    x = self.model(x)\n",
        "    return x\n",
        "\n",
        "# Adds custom batch-level metrics to our previous Functional API model\n",
        "model = MyModel(create_model())\n",
        "model.compile(optimizer='adam',\n",
        "              loss='sparse_categorical_crossentropy',\n",
        "              metrics=['accuracy'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NgtSX5E1uNhC"
      },
      "source": [
        "As before, define our TensorBoard callback and call `model.fit()` with our selected `batch_size`: "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SGWYCUFhjztg"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Epoch 1/5\n",
            "120/120 [==============================] - 4s 27ms/step - loss: 0.4266 - accuracy: 0.8813 - val_loss: 0.2055 - val_accuracy: 0.9415\n",
            "Epoch 2/5\n",
            "120/120 [==============================] - 3s 26ms/step - loss: 0.1864 - accuracy: 0.9476 - val_loss: 0.1417 - val_accuracy: 0.9613\n",
            "Epoch 3/5\n",
            "120/120 [==============================] - 3s 27ms/step - loss: 0.1352 - accuracy: 0.9614 - val_loss: 0.1148 - val_accuracy: 0.9665\n",
            "Epoch 4/5\n",
            "120/120 [==============================] - 3s 26ms/step - loss: 0.1066 - accuracy: 0.9702 - val_loss: 0.0932 - val_accuracy: 0.9716\n",
            "Epoch 5/5\n",
            "120/120 [==============================] - 3s 27ms/step - loss: 0.0859 - accuracy: 0.9749 - val_loss: 0.0844 - val_accuracy: 0.9754\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "<keras.callbacks.History at 0x7fce15c39f50>"
            ]
          },
          "execution_count": 26,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir)\n",
        "\n",
        "model.fit(x=x_train, \n",
        "          y=y_train,\n",
        "          epochs=5,\n",
        "          batch_size=500, \n",
        "          validation_data=(x_test, y_test), \n",
        "          callbacks=[tensorboard_callback])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jSoDvPoDvAci"
      },
      "source": [
        "Open TensorBoard with the new log directory and see both the epoch-level and batch-level metrics:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "kYmYfTeSk7AD"
      },
      "outputs": [],
      "source": [
        "%tensorboard --logdir logs/batch_avg"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LXCePQi7_f50"
      },
      "source": [
        "That's it! You now know how to create custom training metrics in TensorBoard for a wide variety of use cases."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "scalars_and_keras.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
